QVAC-14019: feat(diffusion): add img2img generation via in-context conditioning#884
Conversation
QVAC-13445 Quick Updates for February 24th
Sd loading complete on MacBook Air.
…fferent model types
updated for sd2
got full sdxl to work on Mac
…usion Resolves file-location conflicts for SD3 files added in sd-sd3 branch by placing them under the renamed packages/qvac-lib-infer-diffusion path. Made-with: Cursor
sd3 finished
Rename package directory from packages/qvac-lib-infer-diffusion to packages/lib-infer-diffusion to align with the lib-* naming convention used across the monorepo. Made-with: Cursor
rename: qvac-lib-infer-diffusion -> lib-infer-diffusion
…nto feature-media-generation
gianni-cor
left a comment
There was a problem hiding this comment.
Remaining nit: stats report user-requested dimensions instead of actual output dimensions
When SDEdit or FLUX override genParams.width/genParams.height (e.g. user passes explicit 768x768 but the input image is 375x500), the stats at lines 702-725 still read from gen.width/gen.height which hold the original JSON values. Fix: sync gen after each override so stats reflect what was actually generated.
….cpp Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
….cpp Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
gianni-cor
left a comment
There was a problem hiding this comment.
Please guard the FLUX img2img entry point in JS. The latest SdModel.cpp change fixed the runtime-stats mismatch, but FLUX img2img can still silently take the wrong native branch when users rely on prediction: 'auto' / omitted prediction.
The addon still decides FLUX vs SDEdit from config_.prediction, not from the model family auto-detected inside stable-diffusion.cpp, so this remains a user-facing footgun. A JS-side validation here would make the failure immediate and actionable.
|
One additional docs/types issue: /** Noise prediction type override (auto-detected from model by default) */
prediction?: PredictionTypeThat wording is misleading for FLUX img2img in the current addon implementation. Auto-detection may be sufficient inside |
gianni-cor
left a comment
There was a problem hiding this comment.
One more issue that I think needs fixing before merge: readImageDimensions() currently trusts fixed PNG/JPEG offsets without verifying the buffer is long enough. Because the JS img2img path auto-injects width / height from this helper when callers omit them, a truncated/corrupt image can produce bogus dimensions and a misleading request failure instead of a clean decode error.
|
@gianni-cor , addressed comments, new regression tests exist to prevent these two bugs from silently regressing in the future.
|
This reverts commit 8082388.
…ensions - Add JS-side guard in _runInternal() that throws when init_image is present on a FLUX model (llmModel set) but prediction is not explicitly flux2_flow or flux_flow, preventing silent fallback to SDEdit branch - Add buffer-length checks to readImageDimensions() for truncated PNG (require >= 24 bytes) and JPEG (validate segLen >= 2, guard SOF reads) - Update prediction docstring in index.d.ts to clarify FLUX img2img requires an explicit prediction value - Add regression tests for all of the above (13 cases) Made-with: Cursor
- Update prediction docstring to focus on FLUX.2 img2img guidance - Remove FLUX.1 from encoder file name comments (keep only relevant models) - Update error message to reference FLUX.2 only in user-facing guidance - Keep flux_flow type in PredictionType union for backward compatibility Made-with: Cursor
Register the new input-validation regression tests in the mobile test runner so truncated image and FLUX prediction guard tests run on all platforms. Made-with: Cursor
Made-with: Cursor
- Bump package version from 0.1.3 to 0.2.0 for img2img feature release - Update CHANGELOG.md with 0.2.0 entry: FLUX.2 img2img, input validation, regression tests - Remove stale CHANGELOG (keeping CHANGELOG.md as canonical source) Made-with: Cursor
Restore default-registry baseline to a9eae49a7c95a63 (matches main). The 87783998cb67fe6 baseline was an unintended change. Made-with: Cursor
|
/review |
❌ E2E Mobile Test Results - iOSOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
❌ E2E Mobile Test Results - AndroidOverall Status: FAILED Test Summary
Links
Automated E2E mobile testing powered by AWS Device Farm |
Summary
lib-infer-diffusionusing FLUX in-context conditioning — the reference image is attended to via joint attention, NOT mixed with noiseinit_imagepresent → img2img, otherwise → txt2img)How it works
The user passes
init_image(a PNG/JPEGUint8Array) alongside a text prompt. Internally:This approach (matching the Iris C engine) produces significantly better results than traditional img2img (VAE encode → add noise → denoise), which loses identity features at high strength and produces artifacts at low strength.
Changes
JS layer
addon.js—readImageDimensions()extracts width/height from PNG IHDR or JPEG SOFx headers.runJob()serializesinit_imageUint8Array asref_image_bytesJSON array and auto-injects dimensions to prevent GGML tensor shape assertions.index.js—_runInternal()auto-selects mode:img2imgwheninit_imageis present,txt2imgotherwise.C++ layer
SdModel.cpp—load()setsvae_decode_only = falsefor VAE encoder graph.process()decodesref_image_bytes, setsref_images+auto_resize_ref_imagefor FLUX joint-attention conditioning.SdGenHandlers.cpp— Mode handler validatestxt2imgandimg2img.Tests
test/unit/test_img2img.cpp(301 lines) — JSON round-trip, dimension override, strength bounds, synthetic image pipeline, canceltest/unit/test_ref2img.cpp(390 lines) — reference image routing, auto-resize, full FLUX2 generation with real headshottest/integration/generate-image-flux2-i2i.test.js(175 lines) — end-to-end FLUX2-klein img2imgExamples
examples/img2img-flux2.js— FLUX2-klein Q8 img2imgexamples/img2img-flux2-f16.js— FLUX2-klein F16 variantexamples/img2img-sdxl.js— SDXL img2imgexamples/ref2img-flux2.js— In-context conditioning exampleUsage
Test Plan
Build
npm run build— native addon builds successfullynpm run test:cpp:build— C++ test binary compilesC++ Unit Tests
npm run test:cpp:run:unit—SdModelTest+SdBackendSelectionTestnpm run test:cpp:run:loading—SdModelLoadingTestnpm run test:cpp:run:inference—SdSingleStepInferenceTestnpm run test:cpp:run:generation—SdFullGenerationTestnpm run test:cpp:run— all C++ tests (includes img2img + ref2img + cancel + gen_handlers)JS Integration Tests
npm run test:integration— all JS integration testsgenerate-image-flux2-i2i.test.js— FLUX2 img2img end-to-endgenerate-image-flux2.test.js— FLUX2 txt2imggenerate-image-sdxl.test.js— SDXL txt2imggenerate-image-sd3.test.js— SD3 txt2imggenerate-image.test.js— SD1/SD2 txt2imgmodel-loading.test.js— model load/unloadapi-behavior.test.js— API behaviour validationExamples (manual)
bare examples/img2img-flux2.js— FLUX2 Q8 img2imgbare examples/img2img-flux2-f16.js— FLUX2 F16 img2imgbare examples/img2img-sdxl.js— SDXL img2imgbare examples/ref2img-flux2.js— FLUX2 in-context conditioningbare examples/generate-image.js— SD txt2img (regression)bare examples/generate-image-sdxl.js— SDXL txt2img (regression)bare examples/generate-image-sd3.js— SD3 txt2img (regression)bare examples/quickstart.js— quickstart (regression)Regression
npm run lint— JS lint passes